Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQE: track number of processed samples in each query #10232

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

charleskorn
Copy link
Contributor

What this PR does

This PR adds support for tracking the number of samples processed in a query evaluated by MQE.

Which issue(s) this PR fixes or relates to

Resolves #10138

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • [covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

@charleskorn charleskorn force-pushed the charleskorn/read-samples-tracking branch from 24164eb to 344bfc7 Compare December 13, 2024 00:36
@charleskorn charleskorn marked this pull request as ready for review December 13, 2024 00:55
@charleskorn charleskorn requested a review from a team as a code owner December 13, 2024 00:55
Copy link
Contributor

@tinitiuset tinitiuset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for being so quick on this. Looks good to me.

// QueryStats tracks statistics about the execution of a single query.
//
// It is not safe to use this type from multiple goroutines simultaneously.
type QueryStats struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

Given we don't support anything other than TotalSamples in MQE, I wanted to make this clear in the code by using a struct that only had a field for TotalSamples.

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

Given we don't support anything other than TotalSamples in MQE, I wanted to make this clear in the code by using a struct that only had a field for TotalSamples.

I would be happy with a comment in query.go, but I'm not opposed to a separate struct.

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.

Fair enough, but also thinking of any future stats. I don't mind a separate struct though.

pkg/streamingpromql/engine_test.go Show resolved Hide resolved
require.Equal(t, testCase.expectedTotalSamples, prometheusCount, "invalid test case: expected samples does not match value from Prometheus' engine")

mimirCount := runQueryAndGetTotalSamples(t, mimirEngine, testCase.expr, testCase.isInstantQuery)
require.Equal(t, testCase.expectedTotalSamples, mimirCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also compare the samples loaded as part of our test gauntlet if we expect it to be the same in all cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently difficult due to the optimisation in prometheus/prometheus#14097, as Prometheus' engine sometimes skips loading data for histograms if it's not needed. MQE does not yet have the same optimisation, so there are some expected differences in the total sample count from the two engines in some cases.

Given the tests in TestQueryStats, and the fact the statistics are informational and may differ between engines in the future due to other optimisations, I'm tempted to leave this as-is.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to leave it out if that's the case.

It might be interesting to see some much larger queries/time ranges etc to see if we are returning consistent results. We could perhaps use the same data generated from the benchmarks to create some large queries (of just floats since NH will be different). Then have a flag to RequireEqualResults to compare them etc.

This isn't a blocker.

@tinitiuset tinitiuset force-pushed the charleskorn/read-samples-tracking branch from 56ffb11 to 5dea7f9 Compare December 18, 2024 11:04
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like how we have to iterate through the buffer vs counting the samples on filling the buffer, but otherwise lgtm. Is it possible to see a benchmark of range-vectors of histograms before approving?

@charleskorn
Copy link
Contributor Author

I don't really like how we have to iterate through the buffer vs counting the samples on filling the buffer, but otherwise lgtm. Is it possible to see a benchmark of range-vectors of histograms before approving?

Benchmarks show there is no statistically significant difference with this PR compared to main - the extra work is insignificant compared to everything else that needs to be done.

Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking 🚀

@charleskorn charleskorn merged commit 4ec3018 into main Jan 8, 2025
29 checks passed
@charleskorn charleskorn deleted the charleskorn/read-samples-tracking branch January 8, 2025 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Idea: Measure processed samples on MQE
3 participants